Search CORE

75 research outputs found

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

Author: Federmann Christian
Kocmi Tom
Publication venue
Publication date: 21/10/2023
Field of study

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations. Based on the power of large language models (LLM), GEMBA-MQM employs a fixed three-shot prompting technique, querying the GPT-4 model to mark error quality spans. Compared to previous works, our method has language-agnostic prompts, thus avoiding the need for manual prompt preparation for new languages. While preliminary results indicate that GEMBA-MQM achieves state-of-the-art accuracy for system ranking, we advise caution when using it in academic works to demonstrate improvements over other methods due to its dependence on the proprietary, black-box GPT model.Comment: Accepted to WMT 202

arXiv.org e-Print Archive

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Author: Federmann Christian
Kocmi Tom
Publication venue
Publication date: 31/05/2023
Field of study

We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. In our evaluation, we focus on zero-shot prompting, comparing four prompt variants in two modes, based on the availability of the reference. We investigate nine versions of GPT models, including ChatGPT and GPT-4. We show that our method for translation quality assessment only works with GPT~3.5 and larger models. Comparing to results from WMT22's Metrics shared task, our method achieves state-of-the-art accuracy in both modes when compared to MQM-based human labels. Our results are valid on the system level for all three WMT22 Metrics shared task language pairs, namely English into German, English into Russian, and Chinese into English. This provides a first glimpse into the usefulness of pre-trained, generative large language models for quality assessment of translations. We publicly release all our code and prompt templates used for the experiments described in this work, as well as all corresponding scoring results, to allow for external validation and reproducibility.Comment: Accepted in EAMT, 10 pages, 8 tables, one figur

arXiv.org e-Print Archive

Hybrid machine translation using binary classification models trained on joint, binarised feature vectors

Author: Federmann Christian
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2013
Field of study

We describe the design and implementation of a system combination method for machine translation output. It is based on sentence selection using binary classification models estimated on joint, binarised feature vectors. By contrast to existing system combination methods which work by dividing candidate translations into n-grams, i.e., sequences of n words or tokens, our framework performs sentence selection which does not alter the selected, best translation. First, we investigate the potential performance gain attainable by optimal sentence selection. To do so, we conduct the largest meta-study on data released by the yearly Workshop on Statistical Machine Translation (WMT). Second, we introduce so-called joint, binarised feature vectors which explicitly model feature value comparison for two systems A, B. We compare different settings for training binary classifiers using single, joint, as well as joint, binarised feature vectors. After having shown the potential of both selection and binarisation as methodological paradigms, we combine these two into a combination framework which applies pairwise comparison of all candidate systems to determine the best translation for each individual sentence. Our system is able to outperform other state-of-the-art system combination approaches; this is confirmed by our experiments. We conclude by summarising the main findings and contributions of our thesis and by giving an outlook to future research directions.Wir beschreiben den Entwurf und die Implementierung eines Systems zur Kombination von Übersetzungen auf Basis nicht modifizierender Auswahl gegebener Kandidaten. Die zugehörigen, binären Klassifikationsmodelle werden unter Verwendung von gemeinsamen, binärisierten Merkmalsvektoren trainiert. Im Gegensatz zu anderen Methoden zur Systemkombination, die die gegebenen Kandidatenübersetzungen in n-Gramme, d.h., Sequenzen von n Worten oder Symbolen zerlegen, funktioniert unser Ansatz mit Hilfe von nicht modifizierender Auswahl der besten Übersetzung. Zuerst untersuchen wir das Potenzial eines solches Ansatzes im Hinblick auf die maximale theoretisch mögliche Verbesserung und führen die größte Meta-Studie auf Daten, welche jährlich im Rahmen der Arbeitstreffen zur Statistischen Maschinellen Übersetzung (WMT) veröffentlicht worden sind, durch. Danach definieren wir sogenannte gemeinsame, binärisierte Merkmalsvektoren, welche explizit den Merkmalsvergleich zweier Systeme A, B modellieren. Wir vergleichen verschiedene Konfigurationen zum Training binärer Klassifikationsmodelle basierend auf einfachen, gemeinsamen, sowie gemeinsamen, binärisierten Merkmalsvektoren. Abschließend kombinieren wir beide Verfahren zu einer Methodik, die paarweise Vergleiche aller Quellsysteme zur Bestimmung der besten Übesetzung einsetzt. Wir schließen mit einer Zusammenfassung und einem Ausblick auf zukünftige Forschungsthemen

Universaar

Acronym

Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine Translation

Author: Christian Federmann
Publication venue
Publication date: 23/04/2020
Field of study

Abstract We describe a substitution-based, hybrid machine translation (MT) system that has been extended with a machine learning component controlling its phrase selection. Our approach is based on a rule-based MT (RBMT) system which creates template translations. Based on the generation parse tree of the RBMT system and standard word alignment computation, we identify potential "translation snippets" from one or more translation engines which could be substituted into our translation templates. The substitution process is controlled by a binary classifier trained on feature vectors from the different MT engines. Using a set of manually annotated training data, we are able to observe improvements in terms of BLEU scores over a baseline version of the hybrid system

CiteSeerX

Results from the ML4HMT-12 shared task on applying machine learning techniques to optimise the division of labour in hybrid machine translation

Author: Badia Toni
Costa-Jussá Marta
Federmann Christian
Melero Maite
Okita Tsuyoshi
van Genabith Josef
Publication venue
Publication date: 09/12/2012
Field of study

We describe the second edition of the ML4HMT shared task which challenges participants to create hybrid translations from the translation output of several individual MT systems. We provide an overview of the shared task and the data made available to participants before briefly describing the individual systems. We report on the results using automatic evaluation metrics and conclude with a summary of ML4HMT-12 and an outlook to future work

Irish Universities

DCU Online Research Access Service

Findings of the 2019 Conference on Machine Translation (WMT19)

Author: Barrault Loïc
Bojar Ondřej
Costa-Jussà Marta R.
Federmann Christian
Fishel Mark
Graham Yvette
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

Irish Universities

DCU Online Research Access Service

Iterative Data Augmentation for Neural Machine Translation: a Low Resource Case Study for English–Telugu

Author: Dandapat Sandipan
Federmann Christian
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

Telugu is the fifteenth most commonly spoken language in the world with an estimated reach of 75 million people in the Indian subcontinent. At the same time, it is a severely low resourced language. In this paper, we present work on English–Telugu general domain machine translation (MT) systems using small amounts of parallel data. The baseline statistical (SMT) and neural MT (NMT) systems do not yield acceptable translation quality, mostly due to limited resources. However, the use of synthetic parallel data (generated using back translation, based on an NMT engine) significantly improves translation quality and allows NMT to outperform SMT. We extend back translation and propose a new, iterative data augmentation (IDA) method. Filtering of synthetic data and IDA both further boost translation quality of our final NMT systems, as measured by BLEU scores on all test sets and based on state-of-the-art human evaluation

Repositorio Institucional de la Universidad de Alicante

Tumor Heterogeneity in Lymphomas: A Different Breed.

Author: Federmann Birgit
Fend Falko
Quintanilla-Martinez Leticia
Schürch Christian
Publication venue: 'S. Karger AG'
Publication date: 01/01/2018
Field of study

The facts that cancer represents tissues consisting of heterogeneous neoplastic, as well as reactive, cell populations and that cancers of the same histotype may show profound differences in clinical behavior have long been recognized. With the advent of new technologies and the demands of precision medicine, the investigation of tumor heterogeneity has gained much interest. An understanding of intertumoral heterogeneity in patients with the same disease entity is necessary to optimally guide personalized treatment. In addition, increasing evidence indicates that different tumor areas or primary tumors and metastases in an individual patient can show significant intratumoral heterogeneity on different levels. This phenomenon can be driven by genomic instability, epigenetic events, the tumor microenvironment, and stochastic variations in cellular function and antitumoral therapies. These mechanisms may lead to branched subclonal evolution from a common progenitor clone, resulting in spatial variation between different tumor sites, disease progression, and treatment resistance. This review addresses tumor heterogeneity in lymphomas from a pathologist's viewpoint. The relationship between morphologic, immunophenotypic, and genetic heterogeneity is exemplified in different lymphoma entities and reviewed in the context of high-grade transformation and transdifferentiation. In addition, factors driving heterogeneity, as well as clinical and therapeutic implications of lymphoma heterogeneity, will be discussed

Publikationsserver der Universität Tübingen

Bern Open Repository and Information System (BORIS)

Towards Automatic Face-to-Face Translation

Author: Amodei Dario
Chung Joon Son
Federmann Christian
Kingma Diederik P
Kumar Rithesh
Kunchukuttan Anoop
Lewis Will
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGANComment: 9 pages (including references), 5 figures, Published in ACM Multimedia, 201

arXiv.org e-Print Archive

Crossref

Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment

Author: Bentivogli Luisa
Cettolo Mauro
Federico Marcello
Federmann Christian
Publication venue
Publication date
Field of study

In this paper we present an analysis of the two most prominent methodologies used for the human evaluation of MT quality, namely evaluation based on Post-Editing (PE) and evaluation based on Direct Assessment (DA). To this purpose, we exploit a publicly available large dataset containing both types of evaluations. We first focus on PE and investigate how sensitive TER-based evaluation is to the type and number of references used. Then, we carry out a comparative analysis of PE and DA to investigate the extent to which the evaluation results obtained by methodologies addressing different human perspectives are similar. This comparison sheds light not only on PE but also on the so-called reference bias related to monolingual DA. Also, we analyze if and how the two methodologies can complement each other’s weaknesses

Archivio della ricerca - Fondazione Bruno Kessler